Action recognition in video

نویسنده

  • Pierre Kreitmann
چکیده

Automatic action recognition in video has a broad array of applications, from surveillance to interactive video games. Classic algorithms usually use handcrafted descriptors such as SIFT (see [5]) or HOG (see [3]) to compute feature vectors of videos, and have achieved promising results in the past (see [7]). More recently, Quoc Le and Will Zou at the Stanford AI lab have proved that ISA features obtained from unsupervised learning achieve higher performance, while being much faster to engineer that hand-crafted features (their work is not yet published). SFA features have achieved good results in object recognition as well as position and rotation extraction from artificial video signal (see [4]). In this work, we experiment using SFA features for action recognition. 1 Evaluation framework Our pipeline for action recognition follows the pipeline presented in [7]. It first uses a dense detector to split the videos into spatio-temporal chunks. It computes the SFA feature vector of each of those chunks, and then clusters all those feature vectors (from all the videos). Finally, it defines the feature vector of one video to be the vector (hi), where hi = 1 if and only if one chunk in the video is in the i-th cluster. Those feature vectors are then used by a SVM with a χ2 kernel for classification. 2 One layer SFA We first experiment with a one-layer SFA. For all my experiments, we use movies from the Hollywood2 dataset (see [6]) at half the original resolution, ie. around 300 × 150 px. The size of the patches is 10 × 10 px, so the input dimension is 100. A first SFA pass reduces the dimension to 48. Then, quadratic expansion is performed, and finally SFA selects the 32 slowest features, so that the output dimension is 32. Since the temporal size of each spatio-temporal chunk is 10 frames, the dimension of the feature vector of each spatio-temporal chunk is 320. With the full dataset, the accuracy measured is 22.4 %. For comparison, a random guess would have an accuracy of 8.3 %. However, state-of-the art

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Action Change Detection in Video Based on HOG

Background and Objectives: Action recognition, as the processes of labeling an unknown action of a query video, is a challenging problem, due to the event complexity, variations in imaging conditions, and intra- and inter-individual action-variability. A number of solutions proposed to solve action recognition problem. Many of these frameworks suppose that each video sequence includes only one ...

متن کامل

Recognition of Visual Events using Spatio-Temporal Information of the Video Signal

Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...

متن کامل

Video-based face recognition in color space by graph-based discriminant analysis

Video-based face recognition has attracted significant attention in many applications such as media technology, network security, human-machine interfaces, and automatic access control system in the past decade. The usual way for face recognition is based upon the grayscale image produced by combining the three color component images. In this work, we consider grayscale image as well as color s...

متن کامل

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...

متن کامل

Reflective Teaching in the Context of a Video Club: Nurturing Professional Relationships and Building a Learner Community

The purpose of this study was to examine how four teachers used the seven processes of videotape analysis to develop an analytic approach and reflective thinking towards their teaching. The study was organized within video clubs and was used to describe the interactions among four teachers about their experiences at a language institute. Data were gathered through videotaped recordings of lesso...

متن کامل

شناسایی چهره در رشته‌های ویدیویی با استفاده از افکنش متعامد با حفظ ساختار محلی

In this paper, attempting to improve the recognition rate and solve some problems such as pose, lighting variations and partial occlusion in video sequences using Orthogonal Locality Preserving Projection (OLPP). In this research, first of all face in video frames is detected for background removing. Then each set of images is distributed on a nonlinear manifold and clustered using appropriate ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010